Triphone Statistics for Polish Language

نویسندگان

  • Bartosz Ziólko
  • Jakub Galka
  • Suresh Manandhar
  • Richard C. Wilson
  • Mariusz Ziólko
چکیده

The Polish text corpus was analysed to find information about phoneme statistics. We were especially interested in triphones as they are commonly used in many speech processing applications like HTK speech recogniser. An attempt to create the full list of triphones for Polish language is presented. A vast amount of phonetically transcribed text was analysed to obtain the frequency of triphone occurrences. A distibution of frequency of triphones occuring and other phenomena are presented. The standard phonetic alphabet for Polish and methods of providing phonetic transcriptions are described.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus Creation for Polish Unit Selection Speech Synthesis

This paper describes the process of creating speech corpus for Polish Unit Selection speech synthesis. This task is time-consuming and manually designing the corpus is, in practice, only applicable in Limited Domain Speech Synthesis and Recognition. The sentence selection tools used while designing the corpus are usually based on the Greedy algorithm. The algorithm looks for sentences which cov...

متن کامل

Robust triphone mapping for acoustic modeling

In this paper we revisit the recently proposed triphone mapping as an alternative to decision tree state clustering. We generalize triphone mapping to Kullback-Leibler based hidden Markov models for acoustic modeling and propose a modified training procedure for the Gaussian mixture model based acoustic modeling. We compare the triphone mapping to decision tree state clustering on the Wall Stre...

متن کامل

Accessing Language Specific Linguistic Information for Triphone Model Generation: Feature Tables in a Speech Recognition System

This paper is concerned with a novel methodology for generating phonetic questions used in tree-based state tying for speech recognition. In order to implement a speech recognition system, language-dependent knowledge which goes beyond annotated material is usually required. The approach presented here generates phonetic questions for decision trees are based on a feature table that summarizes ...

متن کامل

Deriving salient learners' mispronunciations from cross-language phonological comparisons

This work aims to derive salient mispronunciations made by Chinese (L1 being Cantonese) learners of English (L2 being American English) in order to support the design of pedagogical and remedial instructions. Our approach is grounded on the theory of language transfer and involves systematic phonological comparison between two languages to predict possible phonetic confusions that may lead to m...

متن کامل

Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition

We examine the use of Deep Neural Networks (DNN) in extracting Baum-Welch statistics for i-vector-based textindependent speaker recognition. Instead of training the universal background model using the standard EM algorithm, the components are predefined and correspond to the set of triphone states, the posterior occupancy probabilities of which are modeled by a DNN. Those assignments are then ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007